Automatic Location and Separation of Records: A Case Study in the Genealogical Domain

نویسندگان

  • Troy Walker
  • David W. Embley
چکیده

Locating specific chunks (records) of information within documents on the web is an interesting and nontrivial problem. If the problem of locating and separating records can be solved well, the longstanding problem of grouping extracted values into appropriate relationships in a record structure can be more easily resolved. Our solution is a hybrid of two well established techniques: (1) ontology-based extraction [ECJ99] and (2) vector space modeling [SM83]. To show that the technique has merit, we apply it to the particularly challenging task of locating and separating records for genealogical web documents, which tend to vary considerably in layout and format. Experiments we have conducted show this technique yields an average of 92% recall and 93% precision for locating and separating genealogical records in web documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Providing a structural model for psychological problems based on disconnection and rejection domain and negative automatic thoughts with mediating role of experimental avoidance

Introduction: Psychological problems are the result of a person's interaction with the environment and include behaviors that cause social conflicts, dissatisfaction and individual unhappiness. The present study aimed to provide a structural model for psychological problems based on disconnection and rejection domain and negative automatic thoughts with mediating role of experimental avoidance....

متن کامل

A comparative study of fractal models and U-statistic method to identify geochemical anomalies; case study of Avanj porphyry system, Central Iran

The most significant aspect of a geochemical exploration program is to define and separate the anomalous values from the background. In the past decades, geochemical anomalies have been identified by means of various methods. Most of the conventional statistical methods aiming at defining the geochemical concentration thresholds for separating anomalies from the background have limited the effi...

متن کامل

L(+) lactic acid production and separation from dairy wastes(whey):in situe separation of lactic acid using lon-exchange resins in automatic control of PH.

Whey with a large amount of BOD(50000 PPM) is a dangerous environmental pollutant.this important source of lavtose(4-5%) is A USEFULL SUBSTRATE FOR A RANGE of fermentation processes.lacitic acid with swvwral applications in industries is one of these products.Specially L(+) isomer of this acid worthing 10 times as much as the mixture of L&D,is used in medical purposes such as absorbable surgica...

متن کامل

Automatic Workflow Generation and Modification by Enterprise Ontologies and Documents

This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...

متن کامل

Automatic Workflow Generation and Modification by Enterprise Ontologies and Documents

This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004